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ABSTRACT 

Given a multi-modal dynamical system, optimal switching 
logic synthesis involves generating the conditions for switch- 
ing between the system modes such that the resulting hybrid 
system satisfies a quantitative specification. We formalize 
and solve the problem of optimal switching logic synthesis 
for quantitative specifications over long run behavior. Our 
paper generalizes earlier work on synthesis for safety. We 
present an approach for specifying quantitative measures us- 
ing reward and penalty functions, and illustrate its effective- 
ness using several examples. Each trajectory of the system, 
and each state of the system, is associated with a cost. Our 
goal is to synthesize a system that minimizes this cost from 
each initial state. We present an automated technique to 
synthesize switching logic for such quantitative measures. 
Our algorithm works in two steps. For a single initial state, 
we reduce the synthesis problem to an unconstrained numer- 
ical optimization problem which can be solved by any off- 
the-shelf numerical optimization engines. In the next step, 
optimal switching condition is learnt as a generalization of 
the optimal switching states discovered for each initial state. 
We prove the correctness of our technique and demonstrate 
the effectiveness of this approach with experimental results. 

1. INTRODUCTION 

One of the holy grails in the design of embedded and 
hybrid systems is to automatically synthesize models from 
high-level safety and performance specifications. In general, 
automated synthesis is difficult to achieve, in part because 
design often involves human insight and intuition, and in 
part because of system complexity. Nevertheless, in some 
contexts, it may be possible for automated tools to complete 
partial designs generated by a human designer, enabling the 
designer to efficiently explore the space of design choices 
whilst ensuring that the synthesized system meets its spec- 
ification. 

One such problem is to synthesize the mode switching 
logic for multi-modal dynamical systems (MDS). An MDS 



Permis.sion to make digital or hard copies of all or part of this work for 
personal or classroom use is granted without fee provided that copies are 
not made or distributed for profit or commercial advantage and that copies 
bear this notice and the full citation on the first page. To copy otherwise, to 
repubhsh, to post on servers or to redistribute to lists, requires prior specific 
permission and/or a fee. 

Copyright 200X ACM X-XXXXX-XX-X/XX/XX ...$10.00. 



is a physical system (plant) that can operate in different 
modes. The dynamics of the plant in each mode is known. 
In order to achieve safe and efficient operation, one needs 
to design the controller for the plant (typically implemented 
in software) that switches between the different operating 
modes. We refer to this problem as switching logic synthesis. 
Designing correct and optimal switching logic can be tricky 
and tedious for a human designer. 

In this paper, we consider the problem of synthesizing 
the switching logic for an MDS so that the resulting sys- 
tem is optimal. Optimality is formalized as minimizing a 
quantitative cost measure over the long-run behavior of the 
system. Specifically, we formulate cost as penalty per unit 
reward motivated by similar cost measure in Economics. For 
a given initial state, the optimal long-term behavior corre- 
sponds to a trajectory of infinite length with infinite number 
of mode switches which has minimum cost. So, discovering 
the optimal long-term behavior requires 

• discovering this infinite chain of mode switches, and 

• the switching states from one mode to another. 

Thus, this problem would seem to involve optimization over 
an infinitely-long trajectory, involving an unbounded set of 
parameters. However, we reduce this problem to optimiza- 
tion over bounded set of parameters representing the repet- 
itive long-term behavior. The key insight is that the long- 
term cost is essentially the cost of the repetitive part of the 
behavior. We only require the user to provide a guess of a 
number of switches which could suffice to reach the repetitive 
behavior from an initial state. The system stays in repeti- 
tive behavior after reaching it and hence, the user can pick 
any large enough bound. We consider the supersequence of 
all possible mode sequences with the given number of mode 
switches and use the times spent in each mode in this su- 
persequence as the parameters for optimization. If the time 
spent in a particular mode is zero, the mode is removed from 
the optimum mode sequence. The optimization problem is 
then formulated as an unconstrained numerical optimization 
problem which can be solved by off-the-shelf tools. Solv- 
ing this optimization problem yields the time spent in each 
mode which in turn gives us the optimum mode switching 
sequence. So, to summarize, for a given initial state, we ob- 
tain a sequence of switching states at which mode transitions 
must occur so as to minimize the long-run cost. The final 
step involves generalizing from a sample of switching states 
to a switching condition, or guard. Given an assumption on 
the structure of guards, an inductive learning algorithm is 
used to combine switching states for different initial states 
to yield the optimum switching logic for the entire hybrid 



system. 

To summarize, the novel contributions of this paper are 
as follows: 

• We formalize the problem of synthesizing optimal switch- 
ing logic by introducing the notion of long-run cost which 
needs to be minimized for optimality (Section [2]). 

• The synthesis problem requires optimization over infinite 
trajectories and not just a finite time horizon. We show 
how to reduce optimization over an infinite trajectory to 
an equivalent optimization over a bounded set of param- 
eters representing the /imit behavior. (Section |4]); 

• We present an algorithm to solve this optimization prob- 
lem for a single initial state based on unconstrained nu- 
merical optimization (Section [Sjl . Our algorithm makes 
no assumptions on the intra-mode continuous dynamics 
other than locally-Lipschitz continuity and relies only on 
the ability to accurately simulate the dynamics, making 
it applicable even for nonlinear dynamics; 

• An inductive learning algorithm based on randomly sam- 
pling initial states is used to generalize from optimal switch- 
ing states for individual initial states to an optimal switch- 
ing guard for the set of all initial states. This gener- 
ated switching logic is guaranteed to be the true optimal 
switching logic with high probability (Section |6}. 

Experimental results demonstrate our approach on a range 
of examples drawn from embedded systems design (Sec- 
tion 0. 

2. PROBLEM DEFINITION 

2.1 Multimodal and Hybrid Systems 

We model a hybrid system as a combination of a multi- 
modal dynamical system and a switching logic. 

Definition 1. Multimodal Dynamical System (MDS). 

A multimodal dynamical system is a tuple (Q, X, /, Init), 
where Q :— {1, . . . , A'^} is a set of modes, X := {xi, . . . , Xn} 
is a set of continuous variables, f : Q x R'^ i— > R'^ defines a 
vector field for each mode in Q, and Init C Q x R''^ is a set 
of initial states. The state space of such an MDS is Q x R"^. 
A function qx : R""" i— ^ (Q x R'^) is said to be a trajectory of 
this MDS with respect to a sequence ti,t2, ■ ■ ■ of switching 
times if 

(i) qx(0) £ Init and 

(ii) for all i and for all t such that ti < t and t < ii+i, it is 
the case that q(t) — q(ti) and 

^^ = /(q(t),x(t)), (1) 

where q and x denote the projection of qx into the mode 
and continuous state components. The function x is con- 
tinuous. The switching sequence is the sequence of modes 
q(ti),q(i2),.... 

If MDS is a multimodal dynamical system, then its semantics, 
denoted [MDS], is the set of its trajectories with respect to 
all possible switching time sequences. 

Definition 2. Switching Logic (SwL). A switching logic 
for a multimodal system MDS := (Q, A, /, Init) is a tuple 

((g?! 92)91, 92eQ) where g^^qj C R^ is the guard defining the 
switch from mode qi to mode 52- 



Given a multimodal system and a switching logic, we can 
now define a hybrid system by considering only those tra- 
jectories of the multimodal system that are consistent with 
the switching logic. 

Definition 3. Hybrid System (HS). A hybrid system is 
a tuple (MDS, SwL) consisting of a multimodal system MDS := 
(Q,X,f, Init) and a switching logic SwL := ((ggi 52)91, 926q)- 
The state space of the hybrid system is the same as the state 
space of MDS. A function qx : R"^ 1-^ (Q x R'^) is said to 
be a trajectory of this hybrid system if there is a a sequence 
ti,t2, . ■ . of switching times such that 

(a) qx is a trajectory of MDS with respect to this switching 
time sequence and 

(b) setting to — 0, for all ti in the switching time sequence 
with i > 1, x(ti) £ gq(ti_i)q(ti) and for all t such that < 

t < ti, X(i) ^ U56Qgq(t,_J,. 

Discrete jumps are taken as soon as they are enabled and 
they do not change the continuous variables. For the notion 
of a trajectory to be well-defined, guards are required to be 
closed sets. The semantics of a hybrid system HS, denoted 
[HS], is the collection of all its trajectories as defined above. 

2.2 Quantitative Measures for Hybrid Systems 

Our interest is in automatically synthesizing hybrid sys- 
tems which are optimal in the long-run. We define a quanti- 
tative measure on a hybrid system HS by extending HS with 
new continuous state variables. The new continuous vari- 
ables compute "rewards" or "penalties" that are accumulated 
over the course of a hybrid trajectory. We also allow the new 
variables to be updated during discrete transitions, which 
enables us to penalize or reward discrete mode switches. 

Definition 4- Performance Metric. A performance met- 
ric for a given multimodal system MDS := (Q, X, f, Init) is 
a tuple (PR, /pR, update), where PR ~ P U Ti is a finite set 
of continuous variables (disjoint from X), partitioned into 
penalty variables P and reward variables R, /pr : Q x R"^ 1-^ 
R™ defines the vector field that determines the evolution of 
the variables PR, and update : Q x Q x R™ R""" defines 
the updates to the variables PR at mode switches. 

Given a trajectory qx : R^ 1— > (QxR^) of a multimodal or 
hybrid system with mode-switching time sequence ti,t2, ■ ■ 
and given a performance metric, we define the extended tra- 
jectory qx" : R+ (Q X R'^ x R™) with respect to the same 
mode-switching time sequence as any function that satisfies 
qx"(0) = (q(0),x(0),0) and qx'=(tj = (q(t), x(t), PR(t)), 
where PR satisfies: ''^^^'^ = /pR(qx(t)) for all t : ti < t < ti+i, 
and PR(ti) = update(q(ti-i), q(ti), lim^^^- PR(t)). 

The cost of a trajectory qx is defined using its corre- 
sponding extended trajectory qx'' as 

p 

cost(qx):=lim|:|l| (2) 

where Pi and Ri are the projection of qx'' onto the i-th 
penalty variable and i-th reward variable, and |P| = \R\. 

We are only interested in trajectories where the above 
limit exists and is finite. We will further define cost of a 
part of a trajectory from time instant ti to a time instant 



t2 {t2 > ti) as follows: 



cost(qx,fi,f2) := X] K 



P. (t2) -P»(tl) 

(ta) - R.(ti) 



(3) 



where Pi and Ri are components of PR as before. 

As the definition of cost indicates, we are interested in 
the long-run average (penalty per unit reward) cost rather 
than (penalty or reward) cost over some bounded/finite time 
horizon. Some examples of auxiliary performance variables 
(PR) and cost function are described below. 

• the number of switches that take place in a trajectory 
can be tracked by defining an auxiliary variable pi that 
has dynamics = at all points in the state space, 
and that is incremented by 1 at every mode switch; 
that is, 

pi{ti) = update(<j,g',pi(t7)) ^ pi(ti) + 1 

• the time elapsed since start can be tracked by defining 
an auxiliary variable ri that has dynamics = 1 at 
all points and that is left unchanged at discrete tran- 
sitions; that is, 

ri(t,) = update(g,g',ri(i,")) = ri{t~) 

• the average switchings (per unit time) can be observed 
to be — . If this cost becomes unbounded as the time 
duration of a trajectory increases, then this indicates 
zeno behavior. Thus, if we use pi and ri as the penalty 
and reward variables in the performance metric, then 
we are guaranteed that non-zeno systems will have 
"smaller" cost and thus be "better". 

• the power consumed could change in different modes of 
a multimodal system and an auxiliary (penalty) vari- 
able can track the power consumed in a particular tra- 
jectory. 

• the distance from unsafe region can be tracked by an 
auxiliary reward variable that evolves based on the dis- 
tance of the current state from the closest unsafe state. 

2.3 Optimal Switching Logic Synthesis 

Definition 5. Optimal Switching Synthesis Problem. 

Given a multimodal system MDS — (Q,X, /, Init) , and a 
performance metric, the optimal switching logic synthesis 
problem seeks to find a switching logic SwL* such that the 
cost of a trajectory from any initial state in the resulting 
hybrid system HS* :— HS(MDS,SwL*) is no more than the 
cost of corresponding trajectory from the same initial state 
in an arbitrary hybrid system HS := HS(MDS,SwL) obtained 
using an arbitrary switching logic SwL, that is, V(x, q) G 
Init . cost(qx*) < cost(qx) where qx*(0) = qx(0) — 
(x,g),qxG lHS*],qxG [HS] 

We will assume, without loss of any generality, that we are 
given an over-approximation of the switching logic SwL"^'^'^ ~ 
{{Sqq''^)q,q' cq) ■ ^Ms case, the optimal synthesis problem 
seeks to find a switching logic SwL* := {islq')q,q' eq) that 
also satisfies the constraint that g*^, C g°™'^ for all q, q' £ 
Q, which is also written in short as SwL* C SwL"""^"^. 

The over-approximation SwL""*^"" of the switching set can 
be used to restrict the search space for switching conditions. 



The set g^^f^ can be an empty set if switches are disallowed 
from qtoq'. The set g°^f can be if there is no restriction 
on switching from g to g'. 

2.4 Running Example 

Let us consider a simple three mode thermostat controller 
as our running example. The multimode dynamical system 
describing this system is presented in Figure [1] The ther- 
mostat controller is described by the tuple {Q, X, f, Init) 
where Q = {OFF, HEAT, COOL}, X = {temp, out}, / is /off : 
temp — — 0.1(temp — out) in mode OFF, /heat : temp = 
-0.1(temp - out) + 0.05(80 - temp) in mode HEAT and /cool : 
temp = — 0.1(temp — out) + 0.15(temp) in mode COOL, and 
Init = OFF X [18,20] x [12,26]. For simplicity, we assume 
that the outside temperature out does not change. 



COOL 

temp = —0.l{temp — out) 
+0.15{temp) 



HEAT 

temp = —0.l{temp — out) 
+0.05(80 - temp) 



grc 




9hf 




9FH 



OFF 

temp = —{].\(temp — out) 



discomfort = (temp — 20)^, fuel = (temp — out)^ 

swTear — 0, time = 1 

update(M, M', swTear) — swTear + 0.5 

for any two different modes M, M' in Q 

Figure 1: Thermostat Controller 

The performance requirement is to keep the temperature 
as close as possible to the target temperature 20 and to 
consume as little fuel as possible in the long run. We also 
want to minimize the wear and tear of the heater caused 
by switching. The performance metric is given by the tuple 
(PR, /pR, update), where penalty variables P — {discomfort, 
fuel, swTear} denote the discomfort, fuel and wear-tear due 
to switching and reward variables R = {time} denote the 
time spent. The evolution and update functions for the 
penalty and reward variables is shown in Figure [1] We need 
to synthesize the guards such that the following cost metric 
is minimized. Since the reward variable is the time spent, 
minimizing this metric means minimizing the average dis- 
comfort, fuel cost and wear-tear of the heater. We give a 
higher weight (10) to discomfort than fuel cost and wear- 
tear. 



lim 



10 X discomf ort(f) + fuel(t) + swTear(t) 
time(t) 



3. RELATED WORK 

There is a lot of work on synthesis of controllers for hy- 
brid systems, which can be broadly classified along several 
different dimensions. First, based on the property of inter- 
est, synthesis work broadly falls into one of two categories. 
The first category finds controllers that meet some liveness 
specifications, such as synthesizing a trajectory to drive a hy- 
brid system from an initial state to a desired final state [221 
119) . while also minimizing some cost metric [6]. The sec- 
ond category finds controllers that meet some safety spec- 
ification; see [1] for detailed related work in this category. 



Our previous work [16] based on combining simulation and 
algorithmic learning also falls in this caregory. Purely con- 
straint based approaches for solving switching logic synthe- 
sis problem have also being used for reachability specifica- 
tions [3D]. While our work does not directly consider only 
safety or only liveness requirements, both these requirements 
can be suitably incorporated into the definition of "reward" 
and "penalty" functions that define the cost that our ap- 
proach then optimizes. While optimal control problems for 
hybrid systems have been formulated where cost is defined 
over some finite trajectory, we are unaware of any work in 
control of hybrid systems that attempts to formulate and 
solve the optimal control problem for long-run costs. 

The second dimension that differentiates work on con- 
troller synthesis for hybrid systems is the space of control in- 
puts considered; that is, what is assumed to be controllable. 
The space of controllable inputs could consist of any com- 
bination of continuous control inputs, the mode sequence, 
and the dwell times within each mode. A recent paper by 
Gonzales et al. [13] consider all the three control parameters, 
whereas some other works either assume the mode sequence 
is not controllable |33l I28j or there are no continuous control 
inputs [2]. In our work, we assume there are no continuous 
control inputs and both the mode sequence and the dwell 
time within each mode are controllable entities. 

The third dimension for placing work on controller syn- 
thesis of hybrid systems is the approach used for solving the 
synthesis problem. There are direct approaches for synthe- 
sis that compute the controlled reachable states in the style 
of solving a game [T] [31] , and abstraction-based approaches 
that do the same, but on an abstraction or approximation 
of the system [25] [TO] [29]. Some of these approaches are 
limited in the kinds of continuous dynamics they can han- 
dle. They all require some form of iterative fixpoint com- 
putation. The other class of approaches are based on using 
nonlinear optimization techniques and gradient descent [131 
[2]. Axelsson et al. [3] use a bi- level hierarchical optimiza- 
tion algorithm with the higher level used for finding optimal 
mode sequence employing single mode insertion technique, 
and the lower level used to find the switching times that min- 
imizes the cost function for fixed mode sequence. Gonzales 
et al. |13II12| extended the technique by also considering con- 
trol inputs apart from mode sequence and switching times. 
Their approach [12] can handle multiple objectives in the 
cost function, can be initialized at an infeasible point and 
can include switching costs. We also use similar techniques 
in this paper and reduce the controller synthesis problem to 
an optimization problem of a function that is computed by 
performing simulations of the dynamical system. 

Notions of long-run cost similar to ours have appeared in 
other areas. The notion of long-run average cost is used 
in economics to describe the cost per unit output (reward) 
in the long-run. In computer science, long-run costs have 
been studied for graph optimization problems [18]. Long- 
run average objectives have also been studied for markov 
decision processes (MDPs) [8] [20]. However, MDPs do not 
have any continuous dynamics. Another related work is op- 
timal scheduling using the priced timed automata [27] in 
which timed automata is extended by associating a fixed 
cost to each transition and a fixed cost rate per time unit 
in a location. We consider multi-modal dynamical systems 
with possibly non-linear dynamics and our cost rates are 
functions of the continuous variables. Further, our interest 



is in long-run cost. There is some recent work on controller 
synthesis with budget contraints where the budget applies 
in the long-run [9]. 

In contrast to existing literature, we present an automated 
synthesis algorithm to synthesize switching logic SwL for a 
given MDS and performance metric such that all trajectories 
in the hybrid system HS(MDS, SwL) have minimum long-term 
cost with respect to the given performance metric. 

4. OPTIMIZATION FORMULATION 

In this section, we formulate the problem of finding switch- 
ing logic for minimum long-run cost from an initial state 
as an optimization problem. Given a multimodal system 
MDS = {Q, X, f,d, Inv, Init), an initial state (go,xo) £ Init 
and the performance metric tuple (F, /y, update), we need 
to find the switching times ti,t2, ■ ■ ■ and the mode switching 
sequence q such that the corresponding trajectory qx is of 
minimum cost. 

min cost(qx) subject to 

q,tl,t2,... 

(l)[Init] : qx(0) = (go,xo) (2)[Guards] : Vi x(tO e g°^r)q(,+i) 

(3) [Time elapse] : \ft . ti < t < t^+i . q(t) = q(ti), j = 1, 2, . . . ; 

(4) [Flow] =/(q(t),x{t)) (4) 

at 

Since the switching sequence ti,t2, ■ ■ ■ could be of infinite 
length, it is not a-priori evident how to solve the above prob- 
lem. In the rest of the section, we formulate an equivalent 
optimization problem with finite number of switching times 
as variables. 

Let trajectory segment qX[j j ^ of a trajectory qx of length 
L — tf — tehe the restriction of the trajectory to tf < t < te, 
that is, cpi^tfM ■ T M> (QxR-^) where T = ltf,te] C R+ and 
qxj(^ te] (*) = 'V^{i) for tf < t < te. The switching times of 
the trajectory segment is a finite subsequence tm,tm+i, ■ ■ - tn 
of the switching times ti . . . , t,n, . . . , t„, . . . of the trajectory 
qx and fm-i < tf < tm and t„ < te < t„+i. The special 
case of a trajectory segment is a trajectory prefix in which 
the trace starts at time ts — 0. 

Our goal is to minimize the lifetime cost. The lifetime 
cost is dominated by the the cost of the limit behavior of 
the system. We are only interested in the following stable 
limit behaviors (discussed in further detail in the extended 
version ^7\) when the lifetime cost is defined by the limit in 
Equation [2] 

• asymptotic: for any e, there exists a time t^ after which 
the trajectory gets asymptotically e-close to some state 
(gT,XT), ||qx(t) — (gT,XT)|P < e for all t > t^ where 
||., .|| denotes the Euclidean norm, or 

• converging: there exists a time tconv after which the 
trajectory converges, qx(f) — qx(tco7iu) for all t > 

iconv , or 

• cyclic: there exists a time t^yc after which the trajec- 
tory enters a cycle with period L, qx(f) = qx(f -I- kP) 
for all t > tcyc and k>l. 

In all these cases, we can reason about the long-run cost 
by considering some finite, but arbitrarily long, trajectory 
prefixes. Suppose the trajectory qx is asymptotic to some 
hybrid state qx°° = (g°°,x°°). In this case, we assume 
that the penalty and reward variables PR also asymptotically 



approach some values PR°° = (P°°,_R°°). Now consider the 
trajectory prefix qX[Q . We have 

cost(qx) = and 



Proof. 



cost(qx 



lO,te]. 



R? 



E 



Riite) 



< 



E 



< cost(qx) + 5e 



Hence, by choosing te appropriately, we can find a trajec- 
tory prefix whose cost is arbitrarily close to the cost of the 
asymptotic trajectory. 

Any repetitive trajectory qx can be decomposed into a 
finite prefix qx^j.^^ = qxjQ followed by a trajectory seg- 



ment qXj, 



qX[jp jp] repeated infinitely. We say 



qx = qXj,^^^ . (qx^^p)'^ 

when \/t < tp . qx(t) — qXp^gy(t) and \/t > tp . qx(t) — 
qx^gp(fp + r) where r — {t — tp) mod L and L — tP — tp. 
The case when the trajectory converges to some hybrid state 
can be treated in the same way as a repetitive trajectory. 

In Lemma [T] and Theorem [T] we summarize how cost con- 
verges to a limit for trajectories with repetitive limit behav- 
ior. 

Lemma 1. For each repetition of the segment qx^^p = 
qxjjp tp] : i'*^ change in penalty and reward variables is con- 
stant, that is, for P = tP — tp. 

Vfc > 1 . Pi{tp + kP) - Piitp +{k~ 1)P) 

^ P{tp + P) - P{tp) = APi 

Vfc > 1 . Riitp + kP) - Ri{tp + {k - \)P) 

= R^{tp 4- P) - Rr(tp) = AT?, 

Proof. The change in penalty and reward variables is 
given by the evolution function /pr and the update on switch 
function update. We know that qx^^^ is repetitive and so, 
qx(tp -f fcP -f t) = qx(tp -f t) for all t < tP - tp and A: > 1 
and hence, 

/pR(qx(tp + kP + t)) = /pR(qx(tp + t)) 

Also, for any mode switch time tp < ti < tP, t'^ = ti + kP 
is also a switch time because hybrid states at ti and t'i are 
the same. Further, 

update(g(t,_i),g(fi), hm PR(t)) 



= upd3.te{q{t[_^),q{t'i), \im PR(t)) 

t— >t'j 

So, integrating fpa over continuous evolution and applying 
update function at mode switches, we observe that 

Pi{tp + kP) - P,{tp +{k-l)P)^ Pi{tp + P)~ P,{tp) 

= AP, 

Ri{tp + kP) - R^{tp + (fc - 1)P) = R^{tp + P)- R,{tp) 

= AR, 

□ 

Theorem 1. For a trajectory qx which can be decom- 
posed into qXp^gj, . (qx^^p)", the cost of the trajectory is 
equal to the cost of the repetitive segment qx.^ , that is, 



1^1 p 

cost(qx) ~ lim > — [Equation [2] 

t->oo ttiit) 



lim y ^ 



P,(tp) + P, (t) -P,(tp) 



i = l 
1P| 



(tp) + R,(t)-R»(fp) 



= lim y ^ 



P,(tp) + A-AP, 



{tp) -\- fcARi 



[Lemma [T] 



AP, 
ARi 

1^=1 



[Pi(tp), Ri(tp) are finite] 



V^{tP)-V,{tp) 
itP)-R.,{tp) 



= cost (qx, tp,tP) [Equation [3] 
= cost(qx^^p) [Definition of qXj.^p] 



□ 



cost(qx) — cost(qx^ 



Using Theorem[TJ the optimization problem in Equation|4] 
is equivalent to the following optimization problem. Intu- 
itively, if the repetitive part of the trajectory and the finite 
prefix before the repetitive part have finite cost, then the 
long run cost of a trajectory in the limit is the cost of the 
repetitive part of the trajectory. More generally, to also han- 
dle the case when the (optimal) trajectory is asymptotic, we 
can replace the cyclicity requirement, qx(fp) = qx(fP), in 
the optimization problem by the weaker requirement that 
the state qx(tP) at time tP be very "close" to the state 
qx(tp) at time tp; see also Section TS.ll 

min cost(qx) subject to 

q,ti,t2,... 

(l)[Init] : qx(0) = (go.xo) (2)[Guards] : Vi x(tO e g°^rfq, 

(3) [Time elapse] -.Vt .ti <t < ti+i . q{t) = q(fi), i = 1, 2 

(4) [Flow] :Vf ^^ = /(q(t),x(t)) 

(5) [Repetitive Trajectory] : qx = qx^^^^. . (qx^.^^)" 

(6) [Repetitive Time] : qx^^^^ = qX[o,tp], qx..^^ = qX[tp,t 
where < ti < . . .t^ < tP, < tp < tP 

5. OPTIMIZATION ALGORITHM 

In this section, we present an algorithm to solve the above 
optimization problem. The key idea is to construct a scalar 
function P(q,ti,t2, . . . ,tn,tp,tP) where q is the switching 
mode sequence; ti,t2,...,tn are the switching times, and 
tp, tP are the times denoting repetitive behavior, such that 
the minimum value of F is attained when the switching 
mode sequence and switching times correspond to the tra- 
jectory qx with minimum long-run cost, and qxj^p is the 
repetitive part of the trajectory. 

Once we have constructed F, we need to minimize F. 
Apart from q, all arguments of F are real-valued. Suppose 
we fix q and let Fq{ti,t2, . . . ,tn,tp,tP) denote the func- 
tion F with fixed mode sequence q. Now Pq is a func- 
tion from multiple real variables to a real, and hence (ap- 
proximate) minimization of F can be performed using un- 
constrained nonlinear numerical optimization techniques [5]. 
These techniques only require that we are able to evaluate 



F once its arguments are fixed. This we accornplish using 
numerical simulation of the multimodal system q 

5.1 Defining f 

The optimization problem in Equation [S] is a constrained 
optimization problem. The constraint qx = qx^^.^^ . (qx^^.^)'' 
requires identifying a trajectory qx starting from the given 
initial state (go,xo) such that it enters repetitive behavior 
at time tp, and q{tp) = q{tP) and x(tp) = x(fP) where 
tp < tP. We call this constraint the repetition constraint. 
A standard technique for solving some constrained opti- 
mization problems is to translate it into an unconstrained 
optimization problem by modifying the optimization objec- 
tive such that optimization automatically enforces the con- 
straint. This is done by quantifying the violation using some 
metric and then minimizing the sum of the earlier mini- 
mization objective and the weighted violation measure. A 
simple example of this approach is presented in the full ver- 
sion [T7j. In order to enforce the repetition constraint by 
suitably modifying the optimization objective, we introduce 
a distance function between the hybrid states. Let d be the 
distance function between two hybrid states such that 

d((gi,xi), ((j2,X2)) = ||xi - X2||^ ifgi=g2 and oo o.w. 

where ||xi — X2II is the Euclidean norm. (Q x X,d) forms a 
metric space. So, the distance between the hybrid states is 
if and only if qi = 52 and xi = X2. 
Let -F(q, ti, . . . ,tn,tp,tP) 



cost(qX[tj,_jp,) -f M X d(qx(ip),qx(tP)) 
if (a) <ti < . . .t„ < tP, tp < tP and 

00 otherwise 



where M is any positive constant and qx is a trajectory 
starting from the given initial state, that is, 

qx(0) = (go, xo); \/t U < t < U+i q(t) = q(tO , i = 1,2, 

andVt^^ = /(q(t),x(t)) 

It is easy to see that the minimum value of the function F 
is attained when the hybrid states at time tp and tP are the 
same, that is, the trajectory segment qx^tj, jp] is the repet- 
itive part of the trajectory and the cost of this segment is 
minimum. Using Theorem [1] we conclude that the opti- 
mization problem in Equation[5]of Section|4]can be reduced 
to the following unconstrained multivariate numerical opti- 
mization problem 



,tp,tp 



F{tl,...,tn,tp,tP) 



(6) 



As remarked above, if the arguments of F are fixed, then 
F can be evaluated using a numerical simulator. Also, for 
a fixed q, we can use a numerical nonlinear optimization 
engine to find the minimum value of the function Fq. 

Running Example 

We illustrate our technique for the running example with a 
fixed sequence of modes say q = OFF, HEAT, OFF starting from 
the initial state (OFF, temp = 22, out = 16). The outside 

^We rely on simulating continuous behavior described by 
ODEs in a single mode for a fixed time period and accurate 
simulation of ODEs is a well-studied problem. 



temperature out does not change with time and remains the 
same as the initial state. Only the room temperature temp 
changes with time. The switching time sequence is ti, i2. Let 
tp denote the time when the thermostat enters the repetitive 
behavior and tP be the time such that temp(tp) = temp(tP). 
When ti < t2 < tp < tP and tp < tP, the function 

Fq{ti,t2,tp,tP) = cost(qX[ip tpj)-flOOO(temp(fp)-temp(tP))^ 

and it is set to 2000 otherwise (approximating infinity in 
the formulation with a very high constant). We use ode45 
function in MATLAB [24] for numerically simulating the 
ordinary differential equations representing continuous dy- 
namics in each mode. In order to find the minimum value 
of Fq and the corresponding arguments that minimize the 
function, we use the implementation of Nelder-Mead sim- 
plex algorithm [26]. The minimum value of Fq is obtained 
at 
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tP 





5.02 


5.24 


3.54 


5.24 


22.0 


19.6 


20.2 


20.02 


20.2 



t 

temp 

So, the switch states corresponding to the minimum long- 
run cost for the given initial state (OFF, temp — 22, out = 
16) and given switching sequence of modes OFF, HEAT, OFF is 
gHF = {20.2} and gPH = {19.6}. 

We repeat the experiments with different initial states but 
with the same mode switching sequence. Even with different 
initial states (OFF, temp = 20.5, out = 16), (OFF, temp = 
21, out = 16) and (OFF, temp = 21.5, out — 16), we obtain 
the same switching states in this example: gHF = {20.2} 
and gFH = {19.6}. 

When we change the mode switching sequence to OFF, HEAT, 
OFF, HEAT, OFF, we discover the optimal switching sequence 
to be 

to ti t2 ts ti tp tP 

t 5.02 5.24 6.73 6.95 3.54 6.95 
temp 22.0 19.6 20.2 19.6 20.2 20.2 20.2 

ti = 5.02, t2 = 5.24, t3 = 6.73, t4 = 6.95, tp = 3.54, = 
6.95 which again yields the same optimal switching states 
gHF = {20.2} and gPH = {19.6}. 

We observe that the optimal behavior with respect to the 
given cost metric would be to switch from OFF mode to HEAT 
mode at temp = 19.6 and then switch from HEAT to OFF mode 
at temp — 20.2 regardless of the initial room temperature as 
long as the outside temperature out — 16. The optimal 
mode cycle is between OFF and HEAT modes. 

For an initial state with outside temperature higher than 
the outside room temperature out > 20, the optimal cycle 
would be between OFF and COOL modes. With the mode 
sequence OFF, COOL, OFF and the initial state (OFF, temp = 
20.5, out = 26), we discover the optimal switching states to 
be gcF = {20} and gFC = {20.3}. 

5.2 Finding Optimal Mode Sequence 

The algorithm above assumed that the switching mode 
sequence q was fixed. It can be easily adapted to also au- 
tomatically discover the optimal switching mode sequence. 
Any mode sequence starting in mode 1 and with almost k 
switches in a system with A'^ modes Q — {1, 2, . . . A*"} is a 
subsequence of 1(2 ... A l)*"', that is, mode 1 followed by 
(2 ... A 1) repeated k times. Let dwell-time of a mode i 



be the time spent in the mode ti+i — ti. Given the switch- 
ing times ti,t2, ...tjvfc and tp,tP, we define the NZ function 
which removes the switch times and modes from the switch- 
ing sequence with zero dwell-times, that is, 

iVZ(q,fi,t2, . • .,tNk,tp,tP) = (q,tii,ti2, . . .,ti^,tp,tP) 
where q = qt^, qi^, . . . , qtj^ ,0 < ti^ < < ■■■ < Uj^ < tP 
and tm ~ tij for all ij < m < ij+i 

For example, given the sequence of switching times 5, 6, 6, 
11, 12, 12 and tp — 6.5, tP — 12.5 with the switching mode 
sequence q = 1, 2, 3, 1, 2, 3, 1, 

NZm, 5, 6, 6, 11, 12, 12, 6.5, 12.5) = (q, 5, 6, 11, 12, 6.5, 12.5) 

where q = 1, 2, 1, 2, 1. 

Given a guess on the number of mode switches k such that 
k or less switches are needed to reach the optimal repeti- 
tive behavior, we can use q = 1(2 ... iV 1)*= as the over- 
approximate switching mode sequence and then find the op- 
timal switching subsequence corresponding to the minimal 
long-run cost behavior using the following modified opti- 
mization formulation. 

min F{NZm,ti,...,tNk,tp,tP)) (7) 

If the optimal value returned by minimizing the above 
function is attained with the arguments tl, . . . ,t*^f., tp* ,tP* , 
then the optimal switching sequence q and the optimal switch- 
ing time sequence is given by 

(q, ti^,...,U^, tp, tP) = Af^(q, , . . . , t*Mk , tp* , tP* ) 
Running Example 

We illustrate the above technique on the running example 
below. Let us guess that reaching the optimal repetitive 
behavior from the initial state OFF, temp = 22, out = 16 
takes almost 2 switches. We consider the mode sequence 
OFF, HEAT, COOL, OFF, HEAT, COOL, OFF which would contain all 
mode sequences with 2 switches (it also contains some mode 
sequences with more than 2 switches). We try to minimize 
the corresponding function F{N Z{t\,t2, . . . , tg,tp, tP)). 

The minimum value obtained for the function F with the 
starting state (OFF, temp = 22, out = 16) by our optimiza- 
tion engine corresponds to the following trajectory. 

to ti t2 ts ti ts tfi tp tP 

t 5.08 5.32 5.32 6.97 7.23 7.23 4.87 8.66 
temp 22.0 19.6 20.2 20.2 19.6 20.2 20.2 19.7 19.7 

The optimal mode sequence and the switching times points 
are obtained as 

NZm, 5.08, 5.32, 5.32, 6.97, 7.23, 7.23, 4.87, 8.66) = 

(OFF, HEAT, OFF, HEAT, OFF, 5.08, 5.32, 6.97, 7.23, 4.87, 8.66) 

Since tp — 4.87 and tP — 8.66, the repetitive part of the 
mode sequence is HEAT, OFF. The switch from mode OFF to 
HEAT occurs at times ti and t4. We observe that temp(ti) = 
temp(t4) — 19.6. So, the optimal trajectory switches from 
OFF to HEAT at temp = 19.6. The switches from HEAT to 
COOL and then to OFF occur at the same times: = ts and 
ts = ta. So, the dwell-time in the mode COOL is and it 
needs to be removed from the optimal switching sequence. 
The switch into mode OFF occurs at times and t^ with 



temp(t3) = temp(t6) = 20.2. Thus, the optimal mode se- 
quence is OFF, (HEAT, OFF)" and the guards discovered from 
this trajectory are gpu = 19.6 and guF = 20.2. □ 

Thus, the approach presented so far can be used to synthe- 
size switching conditions for minimum cost long-run behav- 
ior for a given initial state. We need a guess on the number 
of switches k such that the optimal behavior has almost k 
switches. We summarize the guarantee of our approach for 
a single initial state in the following theorem 

Theorem 2. For a single initial state, our technique dis- 
covers the switching states corresponding to the optimal tra- 
jectory with minimum long-run cost if numerical optimiza- 
tion engine can discover global minimum of the numerical 
function F . 

The proof of the above theorem follows from the definition of 
F. If numerical optimization engines are guaranteed to only 
find local minima of F, our technique will find trajectories 
of minimal cost. We employ the Nelder-Mead simplex algo- 
rithm as described by Lagarias et al [211 126| for minimizing 
F since it is a derivative-free method and it can better han- 
dle discontinuities in function F. We use its implementation 
available as the fminsearch [23] function in MATLAB. 

6. MULTIPLE INITIAL STATES 

The approach presented in Section[S]can find the switching 
state for each mode switch along the trajectory correspond- 
ing to optimal long-run behavior for a given initial state. 
However, since systems are generally designed to operate in 
more than one initial state, we need to synthesize the guard 
condition for mode switches such that the trajectory from 
each initial state has optimal long-run cost. In this section, 
we present a technique for synthesizing guard conditions in a 
probabilistic setting, where we assume the ability to sample 
initial states from their (arbitrary) probability distribution. 
Our technique samples initial states and obtains correspond- 
ing optimal switching states for each mode switch. From 
the individual optimal switching states, we generalize to ob- 
tain the guard condition for the mode switch using inductive 
learning (learning from examples). In order to employ learn- 
ing, we make a structural assumption on the form of guards 
and use concept learning algorithms to efficiently learn the 
guards from sampled switching states. 

Structural Assumption: 

We assume that the guard condition is a halfspace, that is, 
a linear inequality over the continuous variables X. 

In the rest of the section, we discuss how the existing results 
from algorithmic concept learning can be used efficiently to 
learn a halfspace representing the guard condition from the 
discovered switching states for each mode-switch. 
Background: We first mention results which prove the effi- 
cient learnability of halfspaces and then present an algorithm 
which can be used to learn halfspaces in the probabilisti- 
cally approximately correct (PAG) learning framework [32| . 
In this framework, the learner receives samples marked as 
positive or negative for points lying inside and outside the 
concept respectively, and the goal is to select a generaliza- 
tion concept from a certain class of possible concepts such 
that the selected concept has low generalization error with 
very high probability. In our case, the concept class is the 
set of all possible halfspaces in K" and the concept to be 
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learnt is the halfspace that is the correct guard in the opti- 
mal switching logic. The points in a concept to be learnt are 
the states in the guard and the points outside the concept 
are the states outside the guard. 

A halfspace can be learnt with a very high accuracy using 
polynomial-sized sample [4] . We briefly summarize the rele- 
vant results from learning theory that establish the efficient 
learnability of halfspaces in the PAC learning framework. 
A concept class is said to shatter a set of points if for any 
classification of the points as positive and negative, there is 
some concept in the class which would correctly classify the 
points. Any concept class is associated with a combinatorial 
parameter of the class, namely, the Vapnik-Chervonenkis 
(VC) dimension defined as the cardinality of the largest set 
of points (arbitrarily labeled as positive or negative) that 
the algorithm can shatter. For example, consider the con- 
cept class to be partitions in using straight lines, that 
is, halfspaces in R^. The straight line should separate posi- 
tive points in the true concept and negative points outside 
the concept. There exist sets of 3 points that can indeed 
be shattered using this model; in fact, any 3 points that 
are not collinear can be shattered, no matter how one la- 
bels them as positive or negative. However, it can be shown 
using Radon's theorem that no set of 4 points can be shat- 
tered [14]. Thus, the VC dimension of straight lines is 3. 
In general, the VC dimension for halfspaces in R" is known 
to be n -I- 1 [3]. The following theorem from Blumer et 
al [5 establishes the relation between efficient learnability 
of a concept class in the PAC learning framework and VC 
dimension of the concept class. 

Theorem 3. Let C be a concept class with a finite VC 
dimension d. Then, any concept in C can be learnt in the 
following sense: with probability at least 1 — 5 a concept C 
is learnt which incorrectly labels a point with a probability of 
at most e, where C is generated using a random sample of 
labeled points of size at least 

,4, 2 8d, 13, 
max(-log-, — log — 
e e e 



Since the VC dimension for the class of halfspaces in R" is 
n -|- 1, a halfspace can be learnt in PAC learning framework 
using a sample of size at least 

,4, 2 8n-h8, 13, 

max(- log -, log — ) 

e e e 

This, learning halfspaces in R" requires samples polynomial 
in n, i and j and by increasing the probabilistic accuracy 
of the learnt halfspace requires polynomial increase in the 
number of samples. This is critical for efficiently learning 
guards in our algorithm. 

Halfspace learning algorithm: We first discuss an algorithm 
HS infer which can be used to learn halfspaces in the PAC 
framework from a given sample and then, describe the switch- 
ing logic synthesis algorithm. In R" , a halfspace is given by 
e-X + eo>0 where £» e R", G R and X is any point in R" 
which satisfied the above inequality if and only if the point 
is in the concept halfspace to be learnt. For any point Xi, 
let Yi be 1 if Xi is in the concept and —1 if it is outside 
the concept. The algorithm below is the standard Percep- 
tron Learning algorithm and is known to converge after k 
iterations where k < {maXi\\Xi\\) / {mini ^^^nwr^)^ [TT] . 



Input: Set of labelled sample points {{Xi,Yi)} 
Output: 9, 6o such that 9X -f So > is the halfspace 
Set e° = 0, ejj = 0, t = 0; 
for each i do 

if e°X + e^i>0 then 
I Predicted yt = 1 

else 

I Predicted j/i = — 1 
end 
end 

while some i has Yi ^ yi do 
pick some i with Yi ^ yi; 
Qt+i ._Qt^Y,Xi; 
el+^ ~9l + Yi; 
t--t+l; 
for each i do 

if e^X + 6*0 > then 
I Predicted yi — 1 
else 

I Predicted yi — —1 
end 
end 
end 

return 9, 9o 

Algorithm 1: Halfspace learning algorithm HSinfer [TT] 



Learning guards: We now describe the algorithm to learn 
guards for multiple initial states using the technique pre- 
sented in Section [5] and the halfspace learning algorithm 
HSinfer. The algorithm simply involves finding optimal 
switching points for each mode-switch and then using half- 
space learning to infer the guards. The key idea is to use 
the optimal switching states as positive points for the con- 
cept learning problem and the non-optimal states explored 
during optimization (which preceded the optimal switching 
states along any trajectory) as negative points since these 
states cannot be in the guard for an optimal switching logic. 

Input: MDS(X,Q), initial states I, tolerance of 

generalization error S and maximum probability 
of error e 

Output: Optimal Switching Logic SL°^* 

1. Sample initial states from I for provided S, e. 

2. For each initial state, obtain optimal trajectory in 
MDS{X, Q) and switching states for the mode switches 
along the trajectory. 

3. Label the obtained switching states as positive 
points. 

4. Label the states preceding switching states along any 
trajectory to be negative points. 

5. Using obtained sample of positive and negative 
states, learn the guard for the mode switch as 
generalization of these states using HSinfer. 

6. Output these guards as synthesized optimal 
switching logic 

Algorithm 2: Finding optimal switching logic SL°^^ with 
single initial state 

Guarantees: Under the structural assumption that guards 
are halfspaces, our PAC learning algorithm algorithm com- 
putes guards with probability atleast 1 — 5 such that the 
probability that a guard contains any state which is not a 



switching-state or misses any switching-state is at most e. 
Further, the guards inferred by the above algorithm can be 
made probabihstically more and more accurate by choos- 
ing suitable values of e, 5 and considering correspondingly 
larger and larger samples of initial states as given by The- 
orem |3l For a trajectory to be a non-optimal trajectory, 
any one switching point along the trajectory needs to be 
classified correctly. Thus, the following theorem establishes 
the probabilistic guarantees of our switching logic synthesis 
algorithm. 

Theorem 4. Given a MDS{X , Q) , using random sampling 
from the set of initial states which has a sample size poly- 
nomial m n,i and |, Algorithm\^ synthesizes a switching 
logic SwL with probability at least 1 — 5 such that any tra- 
jectory in the synthesized hybrid system HS(MDS,SwL) is not 
optimal with probability at most me, where m is the number 
of guards in the switching logic, that is, m = |SwL| < |Qp 
and n is the number of variables, that is, n = \X\. 

Running Example 

Given the set of initial states 16 < temp < 26 and out G 
{16, 26}. The set of initial states is partitioned into subsets 
where each subset is a 0.1 interval of room temperature temp 
and the outside temperature is 16 or 26. The guards discov- 
ered are: gHF '■ temp > 20.2 A out = 16, gPH '■ temp < 
19.6 A out = 16, gcF : temp < 20.0 A out = 26, gpc ■ 
temp > 20.3 A out = 26. 

7. CASE STUDIES 

Apart from the running example of Thermostat controller, 
we applied our technique to two other case studies: (i) an Oil 
Pump Controller, which is an industrial case study from [7] , 
and (ii) a DC-DC Buck-Boost Converter, motivated by the 
problem of minimizing voltage fluctuation in a distributed 
aircraft power system. We employ the implementation of 
Nelder-Mead simplex algorithm as described by Lagarias et 
al |21l 126) and available as the fminsearch [23] function 
in MATLAB for numerical optimization. A more detailed 
discussion of experiments is available in an extended ver- 
sion [17| . 

7.1 Thermostat Controller 

If we change the cost metric in the thermostat controller 

. 1. discomfort (t)-t-fuGl(£) + swTear(t) . . i . i j_ j_ 

to limt_>oo time(t) gi'^mg cqual weight to 

all the three penalties (instead of 10 : 1 : 1 weight ratio used 
earlier) the optimal switching logic discovered with this cost 
metric are: gj^^ : temp > 20.0 A out — 16, gpH ■ temp < 
18.8 A out = 16, gcF ■■ temp < 21.9 A out = 26, gpC ■ 
temp > 22.7 A out — 26. We observe that the room temper- 
ature oscillates closer to the target temperature when the 
discomfort penalty is given relatively higher weight in the 
cost metric. This case study illustrates that a designer can 
suitably define a cost metric which reflects their priorities 
and, then, our technique can be used to automatically syn- 
thesize switching logic for the given cost metric. 

7.2 Oil Pump Controller 

Our second case study is an Oil Pump Controller, adapted 
from the industrial case study in [3. The example consists 
of three components - a machine which consumes oil in a 
periodic manner, a reservoir containing oil, an accumulator 
containing oil and a fixed amount of gas in order to put 



the oil under pressure, and a pump. The simplification we 
make is to use a periodic function to model the machine's 
oil consumption and we do not model any noise (stochastic 
variance) in oil consumption. 

The state variable is the volume V of oil in the accu- 
mulator. The system has two modes: mode ON when the 
pump is ON and mode OFF when the pump is OFF. Let 
the rate of consumption of oil by the machine be given by 
m — S * (cos(t) -|- 1) where t is the time. The rate at which 
oil gets filled in the accumulator is p. p — 4 when the pump 
is on and p = when the pump is off. The change in volume 
of oil in the accumulator is given by the following equation 

V = p — m where p and m take different values depending 
on the mode of operation of the pump. For synthesis, we 
consider two different sets of requirements [7]. 

In the first set of requirements, the volume of oil in the 
tank must be within some safe limit, that is, 1 < V < 8 
and the average volume of oil in the accumulator should be 
minimized. We model these requirements using our cost def- 
inition by defining one penalty variable pi and one reward 
variable ri . Let the evolution of penalty pi be pi = 1/ if 1 < 

V < 8, M otherwise where M is a very large (M > lO^pi) 
constant (10® in our experiments) and that of reward ri be 
fi — 1. Minimizing the cost function costl = limt_>.oo ^^lt\ 

minimizes the average volume limt_+oo '^^ ^ and also en- 
forces the safety requirement yt . 1 < V{t) < 8. 

In the second set of requirements, we add an additional 
requirement to those in the first set. We require that the 
the oil volume is mostly below some threshold Vhigh = 4.5 
in the long run. We model this requirement by adding an ad- 
ditional penalty and an additional reward variable p2 and r2 
with evolution functions: p2 = 1 if > Vhigh, otherwise 
and r2 = 1 if < Vhigh , otherwise. The new cost func- 
tion is cost2 = limt_K3o(7^ + ffffy)- Let thigh be the total 
duration when the volume is above Vhigh and tiow be the du- 
ration that it is below Vhigh- Minimizing p2/r2 = thigh/Uow 
would ensure that we spend more time with volume less than 
Vhigh in the accumulator. 

The guards: gpM from OFF to ON and gjvF from ON to 
OFF obtained for the above costl objective are gFjv : V < 
3.71 gjvF : V > 4.62 and for cost2 objective are g_Fjv : < 
4.07 givF : V > 4.71. 

We simulate from an initial state V — 4 and the behavior 
for both objectives is presented in Figure 2(a) In both cases, 



the behavior satisfies the safety property that the volume is 
within 1 and 8. Since, we minimize oil volume, the volume is 
close to the lower limit of 1. We also observe that using the 
second cost metric causes decrease in duration of time when 
oil volume is higher than the 4.5 but the average volume of 
oil increases. This illustrates how designers can use different 
cost metrics to better reflect their requirements. 

7.3 DC-DC Buck-Boost Converter 

In this case study, we synthesize switching logic for con- 
trolling DC-DC buck-boost converter circuits described in |15] 
The circuit diagram for the converter and the used param- 
eters are presented in the full version [17]. The goal is to 
maintain the output voltage Vr across a varying load 7? at 
some target voltage Vd- The converter can be modeled as 
a hybrid system with three modes of operation. The state 
space of the system is X = (il uc) where ii is the cur- 
rent through the inductor and uc is the load voltage. The 
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(b) DC-DC Boost Converter 



Figure 2: Case Studies 



dynamics in the three modes are given by the state space 
equation X = AkX + BkE where k — 1,2,3 is the mode 
and E is the input vohage. The coefficients of the equations 
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We mention two key performance requirements of the DC- 
DC Boost Converter described in [TS]. The first requirement 
is that the converter be resihent to load variations. The 
second requirement is to minimize the variance of the voltage 
across the load Vr from the target voltage. This variance 
is called the ripple voltage. We define penalty variable pi 
with the following evolution functions: pi — [Vr — V^)^. 
We want to minimize the average deviation from the target 
voltage. So, we define the reward variable ri with ri = 1. 
The cost function is cost = limt^oo ''^ul ■ This minimizes 
the average variance of Vr from the target voltage Vd- This 
corresponds to minimizing the ripple voltage. Since the load 
R also changes periodically, it also minimizes the transient 
variance in voltage. 

Given the dynamics in each of the three modes and the 
cost function, the synthesis problem is to automatically syn- 
thesize the guards gi2, g23, g3i which minimizes the cost. We 



are given the over- approximation of the guard g2s''^ : = 0. 
The guards obtained are as follows: gi2 : il > 1-9, g23 : 
iL = and g23 : vc > 4.6. The system remains in the 
first mode until the inductor current reaches the reference 
current Iref- The system remains in the second mode until 
the inductor current becomes 0. Then, the system switches 
to the third mode where it remains as long as the capacitor 
voltage remains over the reference voltage Vref- We sim- 
ulate the synthesized system and the behavior is shown in 



Figure 2(b) 



8. CONCLUSION 

In this paper, we present an algorithm for automated syn- 
thesis of switching logic in order to achieve minimum long- 
run cost. Our algorithm is based on reducing the switching 
logic synthesis problem to an unconstrained numerical op- 
timization problem which can then be solved by existing 
optimization techniques. We also give a learning-based ap- 
proach to generalize from a sample of switching states to a 
switching condition, where the learnt condition is optimal 
with high probability. 
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